In this post, we introduce an easy and practical way to deal with not-reached items in a low-stakes assessment. First, we introduce a polytomous scoring approach to deal with not-reach items in computerized low-stakes assessments and then demonstrate how to implement the polytomous scoring approach using R. This approach does not make any explicit assumption regarding the association between not-reached items and student ability, but only considers optimal time use, hence engagement and effortful response behavior when dealing with not-reached items.
(10 min read)
Low-stakes assessments (e.g., formative assessments and progress monitoring measures in K-12) usually have no direct consequences for students. Therefore, some students may not show effortful response behavior when attempting the items on such assessments and leave some items unanswered. These items are typically referred to as not-reached items. For example, some students may try to answer all of the items rapidly and complete the assessment in unrealistically short amounts of time. Oppositely, some students may spend unrealistically long amounts of time on each item and may not finish answering all of the items within the allotted time. Furthermore, students may leave items unanswered due to test speededness-the situation where the allotted time does not allow a large number of students to fully consider all items on the assessment (Lu and Sireci 2007).
In practice, not-reached items are often treated as either incorrect or not-administered (i.e., NA) in the estimation of item and person parameters. However, when the proportion of not-reached items is high, these approaches may yield biased parameter estimates and thereby threatening the validity of assessment results. To date, researchers proposed various model-based approaches to deal with not-reached items, such as modeling valid responses and not-reached items jointly in a tree-based item response theory (IRT) model (e.g., Debeer, Janssen, and De Boeck (2017)) or modeling proficiency and tendency to omit items as distinct latent traits (e.g., Pohl, Gräfe, and Rose (2014)). However, these are typically complex models that would not be easy to use in operational settings.
Response time spent on each item in an assessment is often considered as a strong proxy for students’ engagement with the items (e.g., Kuhfeld and Soland (2020), Pohl, Ulitzsch, and Davier (2019), Rios et al. (2017)). To date, several researchers demonstrated the utility of response times in reducing the effects of non-effortful response behavior such as rapid guessing (e.g., Kuhfeld and Soland (2020), Pohl, Ulitzsch, and Davier (2019), Wise and Kong (2005)). By identifying and removing responses where rapid guessing occurred, the accuracy of item and person parameter estimates can be improved, without having to apply a complex modeling approach.
In this post, we will demonstrate an alternative method that considers not only students with rapid guessing behavior but also students who spend too much time on each item and thereby leaving many items unanswered. In the following sections, we will briefly describe how our approach works and then demonstrate the use of this approach in R.
In our recent study (Gorgun and Bulut 2021), we have proposed a new scoring approach that utilizes response times to transform dichotomous responses into polytomous responses. With our scoring approach, students are able to receive a partial credit on their responses depending on how accurately and rapidly they answer the items. This approach combines the speed and accuracy in the scoring process to alleviate the negative impact of not-reached items on the estimation of item and person parameters.
To conceptualize our scoring approach, we introduce the concept of optimal time that refers to spending a reasonable amount of time interval when responding to an item. Optimal time allows us to make a distinction between students who spend optimal time but miss the item and students who spend too much time on the item and yet answer it incorrectly. By using response time, we group students into three categories:
If an assessment is timed, students are expected to adjust their speed to attempt as many items as possible within allotted time. Therefore, spending too little time (rapid guessers) or too much time (slow respondents) on a given item can be considered an outcome of disengaged response behavior. Our scoring approach enables assigning partial credit to optimal time users who answer the item incorrectly but spend optimal time when attempting the item. Thus, it allows to do more fine-grained analysis of response behavior.
The polytomous scoring approach can be applied using the following steps:
We separate response time for correct and incorrect responses and then find two median response times for each item: one for correct responses and another for incorrect responses. The median response time is used to avoid the outliers in the response time distribution.
We use the normative threshold (NT) approach introduced by Wise and Kong (2005). This process gives us two cut-off values to divide the response time distribution into three regions: optimal time users, rapid guessers, and slow respondents. For example, we can use 25% and 175% of the median response times to specify the optimal time interval1.
After finding the cut-off points for the response time distributions for each item, we select a scoring range. Here we can choose a scoring range of 0 to 3 points or 0 to 4 points.
We determine how to deal with not-reached items. We can choose to treat not-reached items as either not-administered (i.e, missing) or incorrect.
Now, let’s see how the polytomous scoring approach works in R.
To illustrate the polytomous scoring approach, we use response data from a sample of 5000 students who participated in a hypothetical assessment with 40 items. In the response data,
The data also includes students’ response time (in seconds) for each item. The data as a comma-separated-values file (dichotomous_data.csv) is available here.
Now let’s import the data into R and then preview its content.
data <- read.csv("dichotomous_data.csv", header = TRUE)
paged_table(data[,1:40], options = list(cols.print = 12))
paged_table(data[,41:80], options = list(cols.print = 12))
Next, we create a scoring function to transform dichotomous responses into polytomous responsed based on the polytomous scoring approach described above. The polyscore function requires the following arguments:
response: A vector of students’ responses to an item
time: A vector of students’ response times on the same item
max.score: Maximum score for polytomous items (3 for 0-1-2-3 or 4 for 0-1-2-3-4)
not.reached: Response value for not-reached items (in our data, it is “9”)
not.answered: Response value for not-answered items (in our data, it is “8”). These responses are automatically recoded as 0 (i.e., incorrect).
na.handle: Treatment of not-reached responses. If “IN,” not-reached responses become 0 (i.e., incorrect); if “NA,” then not-reached responses become NA (i.e., missing).
correct: The cut-off proportions to identify rapid guessers and slow respondents who answered the item correctly. The default cut-off proportions are 0.25 and 1.75 for the bottom 25% and the top 75% of the median response time.
incorrect: The cut-off proportions to identify rapid guessers and slow respondents who answered the item incorrectly. The default cut-off proportions are 0.25 and 1.75 for the bottom 25% and the top 75% of the median response time.
polyscore <- function(response, time, max.score, na.handle = "NA", not.reached, not.answered,
correct = c(0.25, 1.75), incorrect = c(0.25, 1.75)) {
# Find response time thresholds
median.time.correct1 <- median(time[which(response==1)], na.rm = TRUE)*correct[1]
median.time.correct2 <- median(time[which(response==1)], na.rm = TRUE)*correct[2]
median.time.incorrect1 <- median(time[which(response==0)], na.rm = TRUE)*incorrect[1]
median.time.incorrect2 <- median(time[which(response==0)], na.rm = TRUE)*incorrect[2]
# Recode dichotomous responses as polytomous
if(max.score == 3) {
response <- ifelse(response == 1 & time < median.time.correct1, 2,
ifelse(response == 1 & time > median.time.correct2, 2,
ifelse(response == 1 &
time > median.time.correct1 &
time < median.time.correct2, 3, response)))
response <- ifelse(response == 0 & time < median.time.incorrect1, 0,
ifelse(response == 0 & time > median.time.incorrect2, 0,
ifelse(response == 0 &
time > median.time.incorrect1 &
time < median.time.incorrect2, 1, response)))
} else if (max.score == 4) {
response <- ifelse(response == 1 & time < median.time.correct1, 4,
ifelse(response == 1 & time > median.time.correct2, 2,
ifelse(response == 1 &
time > median.time.correct1 &
time < median.time.correct2, 3, response)))
response <- ifelse(response == 0 & time < median.time.incorrect1, 0,
ifelse(response == 0 & time > median.time.incorrect2, 0,
ifelse(response == 0 &
time > median.time.incorrect1 &
time < median.time.incorrect2, 1, response)))
}
# Set not-answered responses as incorrect
if(!is.null(not.answered)) {
response.recoded <- ifelse(response == not.answered, 0, response)
} else {
response.recoded <- response
}
# Set not-reached responses as NA or incorrect
if(na.handle == "IN") {
response.recoded <- ifelse(response.recoded == not.reached, 0, response.recoded)
} else {
response.recoded <- ifelse(response.recoded == not.reached, NA, response.recoded)
}
return(response.recoded)
}
Before we move to the polyscore function, let’s see how rapid guessers, slow respondents, and optimal time users are identified using one of the items (item 1).
library("patchwork")
library("ggplot2")
# Response time distribution for correct
p1 <- ggplot(data = data[data$item_1==1, ],
aes(x = rt_1)) +
geom_histogram(color = "white",
fill = "steelblue",
bins = 40) +
geom_vline(xintercept = median(data[data$item_1==1, "rt_1"])*0.25,
linetype="dashed", color = "red", size = 1) +
geom_vline(xintercept = median(data[data$item_1==1, "rt_1"])*1.75,
linetype="dashed", color = "red", size = 1) +
labs(x = "Response Time for Item 1 (Correct)") +
theme_bw()
# Response time distribution for incorrect
p2 <- ggplot(data = data[data$item_1==0, ],
aes(x = rt_1)) +
geom_histogram(color = "white",
fill = "steelblue",
bins = 40) +
geom_vline(xintercept = median(data[data$item_1==0, "rt_1"])*0.25,
linetype="dashed", color = "red", size = 1) +
geom_vline(xintercept = median(data[data$item_1==0, "rt_1"])*1.75,
linetype="dashed", color = "red", size = 1) +
labs(x = "Response Time for Item 1 (Incorrect)") +
theme_bw()
(p1 / p2)

Now we can go ahead and implement polytomous scoring with our data. First, we separate the response and response time portions of the data.
resp_data <- data[, 1:40]
time_data <- data[, 41:80]
Next, we apply the polytomous scoring approach using different combinations of max.score and na.handle. We apply the polyscore function to each item using a loop.
polydata3_NA <- matrix(NA, nrow = nrow(resp_data), ncol = ncol(resp_data))
polydata3_IN <- matrix(NA, nrow = nrow(resp_data), ncol = ncol(resp_data))
polydata4_NA <- matrix(NA, nrow = nrow(resp_data), ncol = ncol(resp_data))
polydata4_IN <- matrix(NA, nrow = nrow(resp_data), ncol = ncol(resp_data))
for(i in 1:ncol(resp_data)) {
polydata3_NA[,i] <- polyscore(resp_data[,i], time_data[,i], max.score = 3,
na.handle = "NA", not.reached = 9, not.answered = 8,
correct = c(0.25, 1.75), incorrect = c(0.25, 1.75))
polydata3_IN[,i] <- polyscore(resp_data[,i], time_data[,i], max.score = 3,
na.handle = "IN", not.reached = 9, not.answered = 8,
correct = c(0.25, 1.75), incorrect = c(0.25, 1.75))
polydata4_NA[,i] <- polyscore(resp_data[,i], time_data[,i], max.score = 4,
na.handle = "NA", not.reached = 9, not.answered = 8,
correct = c(0.25, 1.75), incorrect = c(0.25, 1.75))
polydata4_IN[,i] <- polyscore(resp_data[,i], time_data[,i], max.score = 4,
na.handle = "IN", not.reached = 9, not.answered = 8,
correct = c(0.25, 1.75), incorrect = c(0.25, 1.75))
}
Let’s quickly see how one of our recoded data looks like after applying polytomous scoring.
polydata3_NA <- as.data.frame(polydata3_NA)
head(polydata3_NA)
We will demonstrate how to estimate item and person parameters with item-response theory (IRT) approach. However, note that this scoring approach is flexible enough to be applied with classical test theory (CTT). When applied in the CTT framework, a high score indicates that the student’s level of combined ability and engagement was high in the assessment. With this approach, speededness is no longer a nuisance variable but is used in the operationalization of ability (Tijmstra and Bolsinova 2018).
We will import the necessary packages for IRT.
model <- 'F = 1-40'
### Polytomous: GRM
results.grm <- mirt(data=polydata3_NA, model=model, itemtype="graded", SE=TRUE, verbose=FALSE)
### Item parameters
coef.grm <- coef(results.grm, IRTpars=TRUE, simplify=TRUE)
items.grm <- as.data.frame(coef.grm$items)
### Person parameters
theta.grm <- matrix(fscores(results.grm, method='EAP'))
Finally, let’s examine test information function and standard error of measurement based on the estimated person parameters using the polytomous scoring approach.

xxx